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ABSTRACT 

During  the  period  of  time  from  August  1970  through  Jan- 
uary 1971  and  while  employing  the  TSS/36O  Time-Sharing  Sys- 
tem at  this  institution,  it  was  observed  by  the  user 
community  that  the  performance  of  the  system  was  poor  com- 
pared to  the  previously  used  time-sharing  system  -  the  CP/67 
(version  33    from  Cambridge  Research  Center).   For  this  reason, 
the  problem  of  improving  TSS/36O  performance  was  undertaken 
as  a  thesis  project.   Specifically,  the  improvements  consist 
of  an  increase  in  system  performance  -  responsiveness  and 
throughput  -  by  judiciously  adjusting  the  parameters  of  the 

TSS/3^fi  Table— Driven  Schedii~lnvi  ^r  g^cc1"'^0^'*0  '---<-i~  -<-i 

Principles  of  Balanced-Core  Time  and  V/orking  Set  Size. 

A  number  of  test  runs  were  made,  and  the  results  are  giv- 
en, employing  different  schedule  tables.   A  set  of  benchmark 
programs  (or  script)  were  developed  and  used  with  these  tests 
that  were  characteristic  of  a  "typical"  or  "realistic"  load 
at  this  installation. 
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I.  INTRODUCTION 

Since  the  initial  release  of  the  time-shared  operating 
system,  TSS/36O,  in  October  1967,  performance  has  improved 
significantly  with  each  subsequent  release.   However,  for  the 
period  from  August  1970  to  February  1971,  the  Naval  Postgrad- 
uate School  converted  from  the  CP/67  time-sharing  system  to 
the  Time-Shared  System,  TSS/36O,  and  found  the  new  system 
undesirable  to  the  user  community  in  terms  of  system  perfor- 
mance -  responsiveness  and  throughput.   Because  of  its  poor 
performance,  TSS/36O  was  short-lived  at  the  school  and  was 
never  given  an  opportunity  through  testing  and  evaluation 
procedures  to  Indicate  its  worth  and  future  use  as  a  good 
performance  time-sharing  utility. 

The  objective  of  the  research  for  this  thesis  was  to 
find  ways  of  improving  the  performance  of  TSS/36O  at  the 
Naval  Postgraduate • School . 

After  having  read  the  available  literature  on  the 
TSS/36O  system,  it  seemed  that  the  key  area  for  study  and 
work  was  the  scheduling  algorithm.   At  first  a  simulation 
model  of  the  TSS/36O  scheduling  algorithm  looked  like  a 
fruitful  area  of  endeavor;  however,  this  area  was  abandoned 
because  of  the  time  factor  in  building  such  a  detailed  sim- 
ulation model  and  since  It  had  taken  John  McCredie  and 
Steven  Schlesinger  of  Carnegie-Mellon  [Ref.l]  about  a  year 
to  write  such  a  model.   They  describe  a  modular  simulation 


model  designed  to  aid  in  determining  the  value  of  entries  in 
the  TSS/360  schedule  table.   They  showed  that  a  useful  model 
can  be  designed  to  answer  a  limited  set  of  questions  about  a 
complex  system  without  detailed  modeling  of  all  system 
components . 

Another  alternative,  and  the  one  that  was  finally  pur- 
sued, investigated,  and  tested,  was  that  of  methodically 
altering  the  parameters  of  the  TSS/360  Table-Driven  Scheduler 
to  achieve  optimum  system  performance  for  the  particular 
IBM  360/67  hardware  configuration  available. 

In  order  to  test  and  evaluate  the  performance  of  TSS/360, 
which  was  based  on  five  test  runs  with  different  schedule 
tables,  it  was  first  necessary  to  construct  a  set  of  test 
nro fsypi ins  (a  h p ri c J"1  m ^ "^ k  nr  s^riot  ^  that  would  be  representative  of 
a  realistic  load  on  the  system.    This  alone  was  a  difficult 
task  since,  in  a  time-sharing  environment,  many  user  programs 
are  in  contention  for  similar  system  resources  and  at  any 
particular  -time,  there  could  be  many  demands  or  requests  for 
a  particular  resource. 

Another  objective  of  this  paper  is  to  compile  the  avail- 
able literature  regarding  the  performance  of  time-sharing 
systems  that  apply  to  TSS/360  and  show  by  experimental  tests 
that  these  principles  and  concepts  improve  system 
performance . 


II.  NATURE  OF  THE  PROBLEM 

When  this  institution  purchased  the  IBM  360/67  computing 
system  in  1968,  TSS/36O  time-sharing  operating  system  was  not 
yet  available  (with  the  bugs  removed),  but  the  future  em- 
ployment and  implementation  of  TSS  was  the  big  factor  and 
sales  promotion  feature  in  purchasing  the  IBM  360/67  hard- 
ware configuration.   As  an  alternate,  CP/67  (version  3  from 
Cambridge  Research  Center)  was  used  successfully  for  nearly 
two  years.   Then  announcement  was  made  to  the  user  community 
that  in  August  1970  the  IBM  360/67  would  be  operated  as  it 
was  originally  intended  and  that  TS3/360  would  replace  the 
CP/67  time-sharing  system.   Prior  to  implementation  by  the 
computer  facility  programming  staff,  the  TSS/36O  was  debug- 
ged and  tested;  however,  little  consideration  was  given  to 
tuning  the  system  to  the  job  load  of  the  Naval  Postgraduate 
School  environment. 

As  previously  stated,  the  TSS/36O  time-sharing  system 
was  used  for  about  six  months  during  which  time  the  perfor- 
mance was  quite  unacceptable  to  the  user  community.   It  was 
observed  that  heavy  paging  users  could  ruin  the  performance; 
i.e.,  a  few  users  manipulating  large  matrices  or  having 
many  subroutines  not  properly  linked  could  decrease  the  re- 
sponsiveness to  the  other  users.   It  was  for  this  reason  that 
this  thesis  project  was  initiated  and  motivated. 


The  two  basic  approaches  that  have  been  used  for  investi- 
gation of  existing  time-sharing  systems  have  utilized  either 
the  analytic  or  simulation  techniques.   The  analytic  approach 
was  the  technique  used  to  improve  system  performance  of 
TSS/36O.   By  methodically  adjusting  the  parameters  of  the 
TSS/36O  Table-Driven  Scheduler  using  the  principles  of 
Balanced-Core  Time  and  Working  Set  Size,  improvement  of  the 
performance  of  the  system  can  be  achieved.   Walter  J. 
Doherty  [Ref.2]  showed  that  the  performance  of  Release  4 
Schedule  Table  of  TSS/36O  at  the  T.J.  Watson  Research  Center 
was  dramatically  improved  in  a  three-month  period. 

III.  PRINCIPLES  AND  CONCEPTS 


The  principles  and  concepts  discussed  in  this  section 
are  a  compilation  of  the  available  literature  regarding  the 
improvement  of  performance  of  time- sharing  systems  as  they 
pertain  to  TSS/36O. 

A.   PERFORMANCE 

Performance,  appraised  by  Calingaert  [Ref.3]  as  an  inde- 
pendent entity,  does  not  exist.   The  concept  of  performance 
can  have  a  broad  spectrum  of  meaning  to  different  classes  of 
people.   However,  fundamentally,  performance  of  a  computer 
is  defined  as  the  degree  to  which  a  computing  system  meets 
the  expectations  of  the  person  involved  with  it.   Some  of  the 
terms  that  are  often  included  as  aspects  of  performance  are 
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responsiveness,  throughput,  turn-around  time,  availability, 
reliability,  number  of  terminals  supported,  CPU  utilization, 
channel  and  device  utilization,  and  efficiency. 

To  a  user  of  TSS/36O  sitting  at  a  terminal,  the  ability 
of  the  system  to  respond  to  his  commands  is  his  predominant 
view  of  performance  [Ref.4].   A  terminal  user  does  not  care 
if  only  one  person  or  a  hundred  people  are  using  the  system 
simultaneously  with  him  so  long  as  the  user  thinks  that 
there  is  a  complete  and  dedicated  computer  at  his  disposal 
to  provide  certain  services  to  him.   A  user  would  be  much 
more  irritated  if  he  expected  a  TSS/36O  edit  request  to 
respond  in  three  seconds  but  it  took  five  seconds  than  if  he 
expected  a  response  of  ten  minutes  to  some  complex  mathema- 
tical equation  but  it  took  thirty  minutes.   In  other  woras , 
the  system  should  be  much  more  responsive  to  those  requests 
to  which  a  user  expects  an  immediate  reply,  than  to  those 
requests  during  which  the  user  knows  that  his  attention  can 
be  turned  elsewhere.   (He  could  execute  these  programs  in 
the  background  batch  operation  if  the  response  is  too  slow.) 
This  was  a  primary  assumption  that  was  made  while  setting  out 
to  improve  TSS/36O  performance. 

It  is  most  important  to  a  system  manager  to  know  the 
number  of  terminals  that  TSS/36O  can  support,  and  it  is  also 
important  to  consider  the  categories  of  work  that  the  termi- 
nal users  are  doing.   As  Doherty  points  cut  in  his  paper, 
an  intuitively  obvious  but  rarely  mentioned  concept  is  that 
for  some  categories  of  trivial  work,  the  number  of  terminal 
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receiving  adequate  response  may  increase  only  after  a  thresh- 
old of  human  performance  is  reached.   In  other  words,  if  the 
system  is  responding  at  a  rate  slower  than  a  person's  re- 
sponse time,  any  initial  improvements  in  system  performance 
will  first  result  in  the  user's  getting  more  work  done;  and 
only  then  will  the  system  be  able  to  handle  more  users  at 
that  level  of  responsiveness.   By  allowing  longer  delays  in 
processing  long-running  programs  as  the  load  increases,  it 
is  possible  to  ensure  that  the  very  short  jobs  will  constantly 
be  provided  with  a  fast  response. 

B.   FOLDING  PROGRAMS 

Sayre  [Ref.5]  states:  "By  the  unfolded  form  of  a  program 
we  mean  the  form  a  program  would  take  if  it  had  avail ahlp  to 
it  a  large  enough  uniform  memory  to  hold  both  itself  and  its 
data.... On  the  folded  forms  the  addresses  have  been  rear- 
ranged --  folded-to-fit  into  the  smaller  address  space  actu- 
ally available."   In  TSS/36O,  unfolded  forms  of  programs  and 
data  exist  in  virtual  memory.   When  a  program  is  executed, 
portions  of  the  program  and  its  data  are  brought  automatically 
into  main  memory  for  execution,  which  will  result  in  automa- 
tic folding  of  the  program  if  its  complete  execution  space 
requirements  are  larger  than  the  main  memory  available  to 
hold  it.   McCredie  [Ref.6]  expressed  in  his  paper  that  exces- 
sive overhead  and  long  delays  while  pages  are  transferred 
into  and  out  of  core  are  two  potential  dangers  of  paging 
designs.   It  is  important  to  fold  a  program  into  as  small 
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space  as  possible  to  prevent  a  degenerate  situation  called 
"thrashing"  from  occurring  due  to  an  unnatural  folding. 
"Thrashing,"  as  Denning  [Ref.7]  states,  may  also  occur  when 
a  page  is  pushed  from  core  to  make  room  for  another,  but  then 
is  demanded  again  and  brought  back  into  core.   Many  programs 
can  reach  this  state,  and  the  paging  rate  can  get  so  high 
that  all  productive  work  ceases.   It  is  important  to  main- 
tain a  high  degree  of  folding  since  it  permits  many  programs 
to  be  folded  into  main  core  simultaneously,  thereby  providing 
a  potentially  significant  increase  in  the  level  of  multi- 
programming.  The  dynamic  relocation  hardware  available  on 
the  IBM  360/67  makes  the  automatic  folding  concept  possible. 

C.   LOCALITY  OF  REFERENCE 

The  program  performance  on  any  paging  system  is  directly 
related  to  its  page  demand  characteristics.   A  program  which 
behaves  poorly  accomplishes  little  computation  on  the  CPU 
before  making  a  reference  to  a  page  of  its  virtual  memory 
that  Is  on  back-up  storage,  and  thus  it  spends  a  good  deal  of 
time  in  waiting  for  pages  to  be  read  into  core  memory.   A 
program  which  behaves  well  references  storage  in  a  more 
acceptable  fashion,  utilizing  the  CPU  longer  before  referenc- 
ing a  page  which  must  be  brought  in  from  back-up  storage. 
This  characteristic  of  storage  referencing  is  often  referred 
to  as  a  program's  locality  of  reference  and  can  be  found  in 
Brawn's  and  Gustavson's  paper  [Ref.8].   Therefore,  a  program's 
locality  of  reference  will  influence  the  degree  of  folding  to 
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which  that  program  can  be  subjected  with  a  minimal  influence 
on  its  performance.   Doherty  has  shown  that  a  program  with 
good  locality  will  run  more  efficiently  in  a  small  execution 
space  than  one  with  poor  locality. 

D.   WORKING  SET  AND  WORKING  SET  SIZE 

P.J.  Denning  [Refs.9  and  10]  has  investigated  working  set 
models  with  regard  to  program  behavior  in  a  virtual  memory 
environment  such  as  in  the  IBM  360  Model  67.   The  working 
set  W(t,T)  of  a  program  is  the  set  of  pages  referenced  in 
the  T  page  references  immediately  prior  to  time  t.   As  time 
progresses,  W(t,T)  may  or  may  not  change;  however,  the 
better  the  program's  locality  of  reference,  the  less  likely 
it  is  that  W(t+1,T)  ?   W(t,T).   From  Denning's  paper,  it 
appears  natural  to  try  to  fold  a  program  in  such  a  way  that 
the  program's  working  set  for  a  given  time  interval  fits 
entirely  in  core  memory.   Reports  of  Fine,  Jackson,  and 
Mclssac  [Ref.ll]  provide  some  experimental  evidence  that 
the  working  set  concept  is  a  reasonable  assumption  for  pro- 
gram paging  behavior.   Denning  defines  the  working  set  size 
S(t,T)  of  a  program,  at  time  t,  as  the  number  of  pages  con- 
tained in  the  working  set  W(t,T).   Therefore,  it  is  possible 
to  have  the  working  set  size  remain  unchanged  and  have  the 
working  set  change.   It  appears  natural  to  try  to  refold  the 
program  whenever  its  working  set  changes  but,  as  Doherty  in- 
dicates in  his  paper,  it  is  difficult  to  do  since  it 'is  not 
known  in  advance  just  when  the  working  set  is  changing.   So 
in  most  paging  systems,  a  working  set  size  change  is  more 
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easily  detectable;  hence,  it  is  possible  to  detect  working 

set  changes  at  least  when  the  working  set  size  changes. 

Doherty  describes  a  method  for  doing  this,  and  his  method  is 

outlined  below.   The  dynamic  relocation  hardware  of  the 

Model  67  system  makes  the  application  of  this  concept 

possible . 

Using  the  concepts  of  working  set,  working  set  size,  and 

locality  of  reference,  Doherty  states: 

"During  a  single  interaction  between  a  user  at  a  terminal 
and  TSS/36O,  several  programs  are  usually  executed  for 
that  user.   Thus  for  the  virtual  execution  time  which 
spans  this  interaction,  the  working  set  size  may  or  may 
not  change;  however,  the  working  set  will  almost  always 
change  several  times.   Furthermore,  for  those-  programs 
having  good  locality  of  reference,  the  working  set  size 
during  any  one  time  slice  will  usually  be  much  smaller 
than  the  working  set  size  for  the  whole  interaction  time 
interval.   And,  in  addition,  the  maximum  working  set  size 
for  all  the  time  slices  will  probably  always  be  smaller 
than  the  working  set  size  for  the  whole  interaction  time 
interval.   For  those  programs  having  poor  locality  of 
reference,  the  working  set  size  for  each  time  slice  may 
frequently  approach  the  working  set  size  for  the  entire 
interaction  time  interval.   Good  locality  relates  more  to 
the  rate  at  which  new  pages  enter  W(t,T)  than  to  its 
actual  size." 

E.   BALANCED  CORE  TIME 

From  the  previous  discussion,  programs  having  poor  local- 
ity of  reference  and  a  large  working  set  size  would  greatly 
reduce  the  level  of  multiprogramming  if  allowed  to  remain  in 
core  for  very  long  periods  of  time.   This  result  would  affect 
throughput  and  responsiveness,  since  any  new  demands  for  ser- 
vice could  not  be  honored  quickly  because  core  would  be  tied 
up.   The  Principle  of  Balanced-Core  Time  states  that  the 
length  of  the  time  slice  in  terms  of  virtual  CPU  execution 
time  for  any  one  task  is  inversely  oroportional  to  the 


working  set  size  in  that  interval.   Therefore,  this  concept 
will  allow  good  locality  programs  to  progress  very  rapidly, 
whereas  it  will  minimize  the  elapsed  time  that  any  large 
program  (large  working  set  size)  can  tie  up  core  memory.   In 
other  words,  a  minimum  time  slice  length  will  then  be  set  for 
programs  with  large  S(t,T)  and  poor  locality  to  prevent  pag- 
ing overhead  from  dominating  the  system.   In  order  to  com- 
pensate for  this  compromise,  the  duration  between  large 
program  tine  slices  will  be  made  much  longer  than  the  dura- 
tion between  time  slices  for  smaller  working  set  size  pro- 
grams.  As  a  result,  the  level  of  multiprogramming  and 
responsiveness  will  increase  since  more  core  is  available 
more  often.   In  addition,  the  degree  of  CPU  utilization  will 
increase . 


IV.  TSS/360  TABLE-DRIVEN  SCHEDULER 


The  table-driven  scheduler  [Refs.12  and  13]  is  an  algo- 
rithm which  schedules  and  dispatches  tasks  within  the  multi- 
programmed,  time-shared  environment.   More  specifically,  the 
scheduler  consists  of  a  set  of  programs  in  the  resident 
supervisor  of  TSS/360  used  for  scheduling,  and  consists  of  a 
static  and  resident  table  consisting  of  a  variable  number 
(256  maximum)  of  28-byte  entries.   The  28-byte  entries  are 
called  levels  of  the  schedule  table  of  Schedule  Table  En- 
tries (STE).   Each  entry  in  any  one  level  of  the  schedule 
table  contains  sufficient  information  to  completely  control 
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the  execution  of  a  task.   The  format  of  the  schedule  table 
entry  is  depicted  in  Figure  1. 
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Figure  1.  Contents  of  the  Schedule  Table  Entry 
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Each  task  which  enters  the  system  has  another  table  to 
describe  itself  to  the  system  called  the  Task  Status  Index 
(TSI).   Each  TSI  has  a  pointer  to  a  level  in  the  schedule 
table.   Therefore,  by  changing  the  value  of  that  pointer  a 
task  will  be  given  a  completely  new  set  of  scheduling 
parameters . 

All  TSI ' s  in  the  system  are  chained  together  on  one  of 
two  lists  called  the  active  and  inactive  lists.   The  active 
list  has  two  logical  subdivisions  called  the  dispatchable 
and  eligible  lists.   The  dispatchable  list  consists  of 
tasks  occupying  core  storage  and  waiting  for  the  .CPU,  and  in 
most  cases,  whose  Scheduled  Start  Tlme.(SST)  is  less  than 
the  Master  Clock  (MC) .   When  the  SST  of  a  task  is  less  than 
the  Master  Clock,  the  task  is  said  to  be  behind  schedule. 
Tasks  in  the  dispatchable  lists  are  ordered  according  to 
their  status  as  "execute  bound"  or  "I/O  bound."   Those  with 
heavy  paging  demands  (I/O  bound)  are  dispatched  first. 

The  eligible  list  consists  of  tasks  which  are  waiting  for 
entry  to  the  dispatchable.  list,  i.e.,  which  are  ready  to  exe- 
cute but  have  not  yet  been  brought  into  main  '  storage .   These 
tasks  are  ordered  by  priority  with  the  lowest  priority  number 
first  on  the  list. 

The  inactive  list  consists  of  tasks  waiting  on  long 
delay  type  stimuli,  such  as  a  terminal  interrupt.   These 
tasks,  which  are  in  AWAIT  or  TWAIT  status,  are  incapable  of 
continuing  execution  until  a  particular  interruption  occurs. 
Figure  2  depicts  the  movement  of  tasks  among  these  three  lists 
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TASK  REACHES  NOR- 
MAL OR  FORCED 


TASK 

BECOMES 

DISPATCHABLE 


TASK 

LEAVES 

AWAIT 

OR 

TWAIT 

STATUS 


Figure  2.  Maintenance  of  TSI  Lists 
The  schedule  table  controls  the  order  in  which  tasks 
are  brought  into  the  dispatchable  list  and  the  conditions 
under  which  the  task  will  leave  the  dispatchable  list. 

The  fields  of  each  Schedule  Table  Entry  (STE)  can  be 
classified  into  six  logical  areas. 

19 


The  first  is  a  set  of  fields  that  control  dispatching, 
i.e.,  the  order  in  which  tasks  move  from  the  eligible  to  the 
dispatchable  list  (STEPRIOR,  STEDELTA,  STERCMP). 

The  second  is  a  set  of  fields  that  provide  limits  that 
determine  when  a  task  shall  be  time  sliced  and  leave  the 
dispatchable  list  (STETSVAL,  STEQUANT,  STEMAXCR,  STEAWTEX, 
STEPRMT) . 

Third  is  a  set  of  fields  that  specify  the  level  transi- 
tion that  will  be  made  when  the  respective  limit  or  stimulus 
has  been  reached  (STEPULSE,  STETSEND,  STEMPRE,  STETWAIT, 
STE AWAIT,  ATEHLCK,  STELCHL,  STEWLCK,  STECV/O,  STELCF,  STEPRJ3, 
STENSL) . 

Next  is  one  field  which  can  stimulate  a  change  in  the 
order  of  tasks  on  the  dispatchable  list  (STEMRQ). 

Fifth  is  a  set  of  fields  which  allow  the  resident 
supervisor  to  release  some  of  a  task's  pages  rather  than 
time  slice  the  task  (STEST,  STESRI). 

Finally,  there  is  a  field  which  can  override  the  system 
calculated  drum  share  of  private  pages  for  a  task  (STEDSH) . 

Appendix  A  contains  a  description  of  each  of  the  fields 
or  parameters  within  a  schedule  table  entry. 

A.   STRUCTURING  OF  SCHEDULE  TABLE  ENTRIES 

By  implementing  the  scheduling  principles  and  concepts 
previously  discussed,  a  wide  spectrum  of  scheduling  strate- 
gies can  be  implemented  by  altering  only  the  entries  within 
the  schedule  table. 
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In  constructing  the  schedule  tables  according  to  the 
table  scheduling  strategies,  different  sets  of  levels  are 
grouped  according  to  some  primary  goals  of  scheduling. 
Several  particular  programs  (tasks)  are  treated  differently 
than  other  programs,  e.g.,  system  operator  task,  bulk  I/O 
task,  logon,  and  logoff.   Figure  3  shows  an  example  of  a 
schedule  table.   All  other  programs  are  divided  into  the 
interactive  and  batch  categories.   In  general,  the  same  sets 
of  levels  exist  for  both  kinds  of  programs,  except  that  inter- 
active programs  have  priority  over  batch  programs;  that  is, 
interactive  programs,  initially,  have  a  greater  urgency  to 
start  than  do  the  batch.   The  number  of  batch  programs  al- 
lowed to  run  simultaneously  is  arbitrarily  restricted  so 
that  adequate  space  will  be  available  for  anticipated  inter- 
active programs.   The  interactive  sets  of  table  levels  are 
grouped  according  to  the  following: 

1.   The  Starting  Set 

The  starting  set  of  table  levels  are  used  to  handle 
new  inputs  from  the  terminal.   The  functions  of  this  set  of 
table  levels  are  to  facilitate  a  rapid  reply  to  the  terminal, 
if  possible,  and  to  make  an  initial  judgment  of  the  present 
working  set  size  of  longer  running  programs,  so  that  the  best 
entrance  to  the  looping  set  of  table  levels  can  be  chosen  for 
the  particular  program. 

To  accomplish  this,  several  successive  table  levels 
with  high  priority,  small  execution  time  limits  (100  milli- 
seconds), and  increasingly  larger  core  space  limits  (16,  32, 
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Figure    3.    Schedule   Table   Example 
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48  pages)  are  established.   As  each  program  request  enters 
from  the  terminal,  it  will  move  upward  through  these  levels 
each  time  it  exceeds  its  core  space  limit.   Whenever  the 
program  exceeds  its  time  limit  at  any  of  these  levels,  the  core 
space  limit  of  that  level  is  used  as  the  estimate  of  the  pro- 
gram's present  working  set  size.   The  program  is  then  consi- 
dered to  be  a  longer  running  program  and  its  future  execution 
will  be  controlled  by  the  looping  set  of  table  levels.   When- 
ever a  program  exceeds  its  largest  space  limit,  the  largest 
allowable  working  set  size  (64  pages)  will  be  used  as  the 


first  estimate  for  future  execution  under  control  of  the 
looping  set. 

When  a  program  completes  its  execution,  it  is  returned 
to  Lhe  initial  starting  set  table  level  to  await  the  next 
input  from  the  terminal. 
2 .   The  Looping  Set 

The  looping  set  of  table  levels  performs  the  follow- 
ing functions:  they  use  the  fields  of  the  schedule  table  to 
follow  a  program's  working  set  size  by  regularly  overestimat- 
ing and  underestimating  its  time  and  core  space  requirements 
in  a  minimal  fashion  in  accordance  with  the  principle  of 
balanced-core  time;  they  cause  the  load  that  is  generated  by 
long  running  programs  to  be  spread  out  in  time  to  allow  start- 
ing set  entries  to  be  processed  rapidly;  furthermore,  they 


optimize  the  CPU  utilization,  and  thereby  penalize  programs 
with  poor  paging  characteristics  by  causing  programs  with 
minimal  paging  requirements  to  be  selected  to  run  much  .-.ore 
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frequently  than  those  with  large  paging  requirements.   This 
penalty  occurs  only  when  the  program  has  poor  locality  of 
reference  and  a  large  working  set  size. 

3.  The  AWAIT  Set 

The  AWAIT  set  is  a  special  set  of  table  levels  re- 
served for  tasks  doing  tape  I/O  and  other  kinds  of  AWAIT 
operations.   As  previously  described,  in  each  table  level 
there  is  an  AWAIT  extension  field,  which  is  an  elapsed  time 
interval  during  which  a  program's  current  working  set  pages 
are  kept  in  core  while  the  program  remains  idle  in  the  AWAIT 
state.   This  can  cause  severe  elongations  of  real-time  com- 
pared to  virtual  time;  so  that  tasks  with  smaller  values  of 
virtual  time  are  placed  in  this  set  of  table  levels  rather 
than  tasks  of  the  same  working  set  size  which  are  in  the 
looping  set . 

4 .  The  Holding  Interlock  Set 

The  holding  interlock  set  is  also  a  special  set  that 
is  reserved  for  programs  that  are  currently  holding  inter- 
locks on  some  system  resource.   (Holding  an  interlock  means 
that  some  program  is  using  a  resource  and  preventing  other 
programs  from  using  that  resource.)   Programs  in  this  set 
are  given  high  priority  so  that  the  interlocked  resource 
may  be  quickly  released.   An  insignificant  change  in  the 
working  set  size  of  programs  operating  in  this  set  is 
assumed. 
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5 .   The  Wait ing-For-Interlock  Set 

The  waiting-for-interlock  set  is  another  special  set 
of  levels  for  programs  that  are  waiting  for  interlocks  to  be 
released  that  are  currently  being  held  by  other  programs 
in  the  holding  interlock  set.   Until  the  interlock  is  re- 
leased, programs  in  this  set  of  table  levels  will  not  usually 
be  considered  for  dispatching.   An  insignificant  change  in 
the  working  set  size  is  also  assumed  for  the  interlock  set. 

V.  EXPERIMENTAL  PROCEDURE  AND  RESULTS 

In  order  to  make  a  number  of  test  runs  using  different 
schedule  tables,  it  was  first  necessary  to  provide  a  number 
of  programs  that  would  characterize  a  "realistic"  load  on 
the  system  relative  to  user  demands  at  this  school.   This 
was  necessary  since  TSS/36O  was  no  longer  the  current  time- 
sharing system  in  use  at  this  computer  installation,  and  a 
fixed  load  was  needed  to  make  valid  performance  comparisons. 

A.   DEVELOPMENT  OF  A  BENCHMARK 

As  was  previously  discussed,  the  benchmark  design  concept 
for  general  purpose  time-sharing  systems  is  not  an  easy  task 
to  undertake  and  is  confounded  by  two  factors.   The  first  is 
the  variety  of  demands  placed  upon  the  system  and  second  is 
the  stochastic  behavior  of  a  time-shared  system.   Arnold  D. 
Karush  [Ref.14]  presented  an  excellent  discussion  of  the  de- 
velopment of  a  benchmark  design  for  the  ADEPT  Time-Sharing 
System  at  System  Development  Corporation,  and  pointed  out 
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specific  functional  variables  (compute  activity,  interactive 
activity,  I/O  activity,  page  activity,  response  allocation, 
user  population,  and  swap  activity)  that  affect  system 
performance  -  specifically  response  time  and  throughput. 
Karush  discusses  two  general  program  design  techniques  used 
to  measure  the  performance  of  time-sharing  systems  -  the 
analytical  and  stimulus  methods.   The  analytical  technique 
involves  the  insertion  of  probes  into  the  system  running 
under  actual  operating  conditions.   The  stimulus  technique 
consists  of  a  "black  box."  concept  and  involves  applying  a 
controlled  and  measurable  set  of  stimuli  to  the  black  box 
to  activate  the  functional  variables  and  then  observe  the 
effect  of  the  stimuli  upon  the  system. 

The  stimulus  technique  was  used  to  develop  the  scripts 
for  the  experimental  tests  used  in  this  paper;  specifically 
a  similar  set  of  programs  was  used  by  the  CP/67  and  TSS/36O 
Time-Sharing  System  comparison  group  [Ref.15]. 

The  final  set  of  benchmark  programs  used  in  the  test 
runs  were  as  follows: 

PLILG  -  large  PL/I  compilation 

PLISM  -  small-sized  PL/I  compilation 

FORT  -  Fortran  program  that  is  compiled 

FORTEX  -  Fortran  program  that  is  executed 

EDIT  -  execute  routine  that  edits  a  simple  program  and 
files  the  edited  program 

PAGE  -  Fortran  program  which  executes  a  large  matrix 
multiplication . 
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B.  MEASUREMENT  TECHNIQUES 

Two  types  of  performance  criteria  were  used  to  measure 
and  judge  the  improvements  in  performance.   The  measurement 
consisted  of  observing  the  response  times  and  throughput. 
The  benchmark  programs  used  in  the  tests  were  written  to  give 
the  real  time  at  the  commencement  and  at  the  completion  of 
a  compilation.   The  throughput  was  calculated  by  observing 
the  completed  compilation  or  execution  of  a  particular  type 
job.   The  figure  obtained  by  this  procedure  is  called  the 
throughput  factor  and  was  obtained  as  follows : 

TPi   =  SS/(RD  x  NTj_) 
where  SS    =  Sample  Size  (number  of  completed  jobs) 

RD    =  Run  Duration 

NTjl   =  Number  of  terminals  running  program  type  i 
In  essence,  the  throughput  factor  is  the  reciprocal  of  the 
time  to  execute  the  program,  modified  by  the  size  of  the 
sample . 

C,  TOOLS  FOR  MEASURING  PERFORMANCE 

Unfortunately  no  hardware  or  software  measurement  device 
was  available  to  measure  resource  utilization  and  performance 
of  TSS/36O  in  this  research.   A  software  measurement  tool 
called  SIPE  was  obtained  from  IBM,  but  the  required  data 
analysis  programs  could  not  be  obtained.   Thus  the  actual 
measurements  could  be  made,  but  there  was  no  means  of  convert 
ing  them  into  meaningful  information  on  resource  utilization. 
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The  problem  of  developing  a  data  analysis  program  to  analyze 
the  data  from  SIPE  was  considered  as  beyond  the  scope  of  this 
research. 

D.   PRESENTATION  AND  DISCUSSION  OF  RESULTS 

Five  test  runs  were  conducted  using  different  schedule 
tables.   The  results  of  these  tests  will  be  presented  and 
discussed  in  this  section. 

The  IBM  360  Model  67  configuration  of  the  Naval  Post- 
graduate School  is  shown  in  Figure  4  and  is  very  similar,  but 
not  identical,  to  the  IBM  T.J.  Watson  Research  Center's  Model 
67  configuration  which  Doherty  used  for  his  work.   It  should 
be  noted  that  when  the  TSS/36O  Time-Sharing  System  was 
implemented  at  this  school  for  the  months  previously  men- 
tioned, the  new  IBM  Watson  Research  Table  by  Doherty  was  not 
used.   The  initial  schedule  table  used  in  TSS/36O  is  shown  in 
Appendix  B  (Figure  Bl).    This  table  provided  poor  perfor- 
mance to  the  user  community.   Just  prior  to  TSS/36O  being 
replaced  by  the  new  CP/67  version  time-sharing  system,  the 
new  IBM  Research  Schedule  Table  arrived  and  was  implemented 
by  extending  and  using  important  parameters  that  were  never 
used  in  the  old  table.   A  significant  improvement  in  perfor- 
mance was  observed.   This  improved  schedule  table  is  shown 
in  Appendix  B  (Figure  B2  ) .   In  fact  about  a  fifty  percent 
increase  in  utilization  was  observed,  and  yet,  it  was  clear 
that  more  improvement  could  be  obtained.   It  was  not  until 
these  tests  were  begun  that  the  new  IBM  Research  Table 
(Figure  B2  of  Appendix  B)  was  implemented  and  tested: 
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Figure  4.  Naval  Postgraduate  School 
IBM  366  Model  67 
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1.  Test  1 

Test  1  was  a  preliminary  test  in  which  the  benchmark 
programs  (or  scripts)  were  initially  used  and  in  which  the 
new  IBM  Watson  Research  Schedule  Table  (Figure  B2  of  Appen- 
dix B)  was  used.   The  load  configuration  and  performance 
statistics  for  this  test  can  be  seen  in  Figure  5.   Run  six, 
operating  with  a  good  sampling  of  all  the  script  except 
paging,  produced  a  mean  response  of  8  min  37  sec  for  a  large 
PL/1  compilation,  k   min  30  sec  for  a  small  PL/1  compilation, 
1  min  8  sec  for  a  Fortran  compilation  and  48  sec  for  an 
edit.   This  test  did  not  provide  a  heavy  load  to  the  system. 
This  table,  however,  did  provide  better  responses  than  were 
previously  observed  by  the  user  community  when  TSS/360  was 
running  on  a  regular  basis  using  the  old  schedule  table. 

2.  Test  2 

Test  2  was  conducted,  with  the  same  schedule  table 
used  in  Test  1,  to  provide  a  more  realistic  mix  with  different 
ratios  of  edit-to-run  (compile  and  execute)  programs  and 
heavier  load  on  the  system.   An  important  factor  to  remember 
in  scheduling  is  that  almost  any  scheduling  technique  will 
show  similar  results  under  light  loads,  but  it  is  only  when 
the  demand  for  system  resources  gets  large  that  scheduling 
differences  are  clearly  indicated.   The  run  durations  were 
also  lengthened  to  provide  a  steadier  load  on  the  system. 
The  load  characteristics  and  the  performance  statistics  for 
test  2  are  shown  in  Figure  6.   Under  this  change  in  load, 
the  response  times  have  correspondingly  increased  significantly 
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Figure  5.  TSS/360  Test  1 
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Each  test  run  was  conducted  under  a  terminal  load  of 
27  users.   Runs  three  and  four  were  conducted  with  heavy 
paging,  and  as  a  result,  a  greater  delay  was  observed  in  the 
response  to  a  request.   It  was  believed  initially  from  the 
first  two  runs  that  the  PL/I  compiler  characteristics  produced 
the  heavy  load  and  the  poor  response,  but  when  several  heavy 
paging  programs  were  added  to  the  load,  the  performance  was 
degraded  even  more.   Paging  in  TSS/36O  is  handled  by  disk  as 
well  as  drum,  and  since  disk  paging  is  slow,  this  might  be 
one  of  the  major  problems. 

3.  Test  3 

When  test  3  was  performed,  one  of  the  three  core 
boxes  failed.   The  results  of  this  test  indicate  that  TSS/36O 
operating  with  only  two  core  boxes  rather  than  three  will 
produce  a  much  lower  system  performance,  so  low  that  the  re- 
sults are  meaningless  for  a  comparison  and  are  not  included 
in  this  thesis. 

4.  Test  4 

Without  changing  the  schedule  table  of  test  2,  runs 
one  through  four  were  conducted  to  see  if  a  different  load 
would  change  the  performance  characteristics.   Run  four 
seemed  to  be  a  good  sampling  of  the  scripts  and  provided  a 
heavy  paging  load,  and  the  performance  characteristics  were 
about  the  same  as  that  in  run  three  of  test  2.   The  load 
conditions  and  performance  statistics  for  run  four  are 
shown  in  Figure  7- 
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Figure  7.  TSS/360  Test  4 
Load  Conditions  and  Performance  Statistics 
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Run  five  was  conducted  with  the  IBM  Research 
Schedule  Table  patch  altered.   This  modified  IBM  schedule 
table  is  shown  in  Appendix  B  (Figure  B3) .   The  table  para- 
meters that  were  altered  for  this  run  are  found  in  the 
table  levels  of  the  Looping  Interactive  Sets  and  the  Start- 
ing Set  of  the  schedule  table,  since  these  sets  provide 
areas  in  which  the  most  improvements  in  performance  could 
be  realized.   Several  fields  of  the  schedule  table  levels 
were  altered,  but  none  were  changed  drastically.   This  was 
done  so  that  any  degradation  to  the  system  which  may  have 
occurred  from  changing  parameter  values  could  be  observed. 
The  fields  altered  and  the  reasons  for  the  alterations  were 
as  follows: 

The  delta-to-run  parameters  were  increased  so  that 
the  larger  working  set  size  programs  could  get  into  core 
faster  but  less  frequently  and  remain  there  longer  with 
larger  values  of  time-slice  end.   The  smaller  size  programs 
still  get  priority  through  the  system. 

The  AWAIT  extension  field  increases  the  time 
allowed  for  the  larger  size  programs  to  remain  on  the  dis- 
patchable  list  before  being  forced  to  time  slice.   Since  a 
task  in  AV/AIT  status  is  normally  moved  from  the  list  of  dis- 
patchable  tasks,  and  since  this  can  cause  a  delay  in  redis- 
patching  the  task,  the  idea  was  to  make  the  AV/AIT  extension 
large  enough  to  allow  for  completion  of  I/O  operations. 

A  few  priority  values  were  changed,  since  these 
priorities  determine  the  position  a  task  will  assume  within 
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the  list  of  eligible  tasks;  that  is,  low  priority  numbers 
are  given  precedence  over  higher  priority  numbers . 

The  Quantum  Count  and  Quantum  length  fields  were 
altered.   These  parameters  determine  the  time  slice,  which 
is  dynamic,  for  tasks  assigned  to  this  entry.   Time  slice 
duration  equals  Quanta  Count  times  Quantum  length  x  3-33 
milliseconds.   These  fields  were  altered  to  see  the  effect 
of  the  Balanced-Core  Time  Principle —  where  the  time  slice 
duration  in  terms  of  CPU  execution  time  for  a  task  is  in- 
versely proportional  to  the  working  set  size  in  that  time 
interval.   This  will  minimize  elapsed  time  that  any  large 
job  can  clog  memory  and  allows  jobs  with  good  locality  to 
progress  rapidly. 

The  maximum  core  page  residency  values  (MAXCR)  have 
been  selected  to  minimize  task  performance.   Trivial  and 
many  non-trivial  commands  require  less  than  35(23  hexa- 
decimal) pages  allowed  in  the  small  conversational  levels. 
However,  some  non-trivial  commands  take  more  pages,  causing 
the  task  to  move  to  other  levels.   If  tasks  with  the  Steal 
Request  Flag  (SRF)  on  move  into  core  faster  than  pages  can 
be  released,  they  will  exceed  the  MAXCR  limit  and  be  time 
sliced . 

The  maximum  relocations  per  quantum  field  was 
altered.   The  smaller  the  value,  the  greater  the  guarantee 
the  task  will  be  considered  I/O  bound  and  its  order  in  the 
dispatchable  list  will  not  change.   Therefore,  tasks  which 
must  be  serviced  can  remain  on  or  near  the  too  of  the 


36 


dispatchable  list  by  assuming  them  to  levels  with  small  MRQ 
values  . 

The  recompute  flag  field  was  altered.   If  tasks  in 
these  levels  fall  behind  schedule,  they  will  be  given  pre- 
ference through  the  computation  of  their  schedule  start 
time.   If  the  preempt  flag  is  on,  a  task  can  be  time  slice 
ended  if  a  higher  priority  task  is  ready  and  can  not  be 
dispatched. 

The  scan  threshold  fields  were  reduced  in  value, 
since  it  was  felt  that  a  100%  page  stealing  value  was  not 
necessary.   The  scan  threshold  is  related  to  page  stealing. 
It  should  be  noted  that  the  stealing  mechanism  which  sets 
the  steal  flag  was  not  implemented  in  the  old  schedule  table 
that  vss  used  i^i^iall"  with  t^a  system,   '"'"'his  field  value 
was  altered  to  allow  page  stealing. 

As  shown  in  Figure  7,  by  primarily  increasing  the 
delta-to-run  and  quantum  fields,  the  large  working  size 
programs  (PLILG)  were  penalized  in  their  response  times, 
whereas  an  improvement  in  response  was  observed  in  small 
PL/I  and  Fortran  compilations.   However,  in  the  EDIT  pro- 
grams, response  times  were  even  worse  during  this  run  than 
before  and  the  throughput  factor  went  down. 
5.   Test  5 

The  last  test  was  conducted  using  three  different 
schedule  tables.   The  characteristics  for  this  test  are 
shown  in  Figure  8.   Unfortunately,  there  were  only  20  ter- 
minals loading  the  system,  since  the  other  terminals  were 


37 


inaccessible  or  inoperable.   The  time  was  also  limited  for 
these  test  runs  so  that  the  durations  were  shorter  than  was 
desirable.   The  schedule  table  for  run  three  is  shown  in 
Figure  B4  of  Appendix  B.   The  parameters  that  were  altered  for 
the  schedule  table  for  run  three  were  the  delta-to-run  fields, 
which  were  set  to  very  large  values,  and  the  quantum  fields. 
Although  the  load  was  not  as  heavy  as  that  of  test  4,  these 
test  runs  do  show  significant  improvement,  and  the  increased 
performance  is  the  result  of  judiciously  altering  these  para- 
meters.  The  response  times  for  PLILG  programs  for  runs  two 
and  three  were  about  the  same,  while  the  response  time  for 
FORT  programs  was  better  for  run  one  than  for  two  and  three. 
However  ■  it  is  expected  that  if  a  heavier  load  had  been 
placed  on  the  system,  run  three  would  have  provided  the  bet- 
ter performance  statistics.   The  response  times  for  small 
size  PL/I  programs,  and  EDIT  programs  for  run  three  show  bet- 
ter response  statistics,  and  the  response  time  for  FORT 
programs  for  run  three  shows  an  incj-'ease  over  run  two. 
Figures  9,10  and  11  show  the  difference  in  response  times 
for  each  of  these  runs.   The  throughput  factor  could  not  be 
obtained  for  big  and  small  PL/I  programs,  but  run  three 
shows  an  increase  in  throughput  over  run  two  for  FORT  pro- 
grams but  about  the  same  as  run  one.   For  EDIT  programs,  run 
three  shows  an  increase  in  throughput  over  run  one  and  two. 
Figures  12  and  13  show  the  difference  in  throughput. 
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VI.  CONCLUSIONS  AND  RECOMMENDATIONS 

The  objectives  of  this  paper  were  to  organize  all  avail- 
able literature  regarding  improvement  of  performance  measures 
and  techniques  for  the  TSS/36O  Time-Sharing  System  Schedule 
Tables  and  to  show  that  these  principles  and  concepts  could 
be  substantiated  by  performing  experimental  tests  on  the 
computer.   As  a  result  of  altering  the  parameters  of  the 
TSS/36O  schedule  table,  improved  performance  over  the  initial 
system  perfcrrnance ,  when  the  TSS/36O  system  was  in  full  ope- 
ration, was  observed.   From  these  tests  it  is  evident  that 
because  of  differences  in  the  user  community  and  in  hardware 
configurations  it  is  necessary  that  certain  parameters  in  the 
table-driven  scheduler  be  set  for  each  installation  to  improve 
its  system  performance  and  thus  maintain  a  satisfied  user 
community . 

It  has  been  shown  by  these  tests  that  the  Naval  Postgra- 
duate School's  Model  67  computer  could  support  about  20-25 
simultaneous  users  using  a  modified  IBM  Research  schedule 
table,  while  maintaining  a  fair  response  to  the  trivial  re- 
quests, and  simultaneously  servicing  large  users  rather  well. 
With  more  work  on  the  schedule  tables,  better  service  could 
be  provided  for  a  greater  simultaneous  load.   Once  the  TSS/36O 
Time-Sharing  System  was  removed  as  the  installation's  time- 
sharing system,  the  time  available  for  testing  in  this  project 
was  restricted.   Many  more  valuable  tests  remain  to  be 
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performed  to  eventually  optimize  the  performance  of  the  sys- 
tem through  the  judicious  alteration  of  the  parameters  of  the 
TSS/36O  table-driven  scheduler. 

There  were  many  fields  of  the  TSS/36O  schedule  table  that 
were  not  varied  and  tested.   For  example,  during  the  last  test 
a  table  was  designed  to  test  the  page  drum  mechanism,  but 
since  this  mechanism  was  not  yet  implemented  into  the  soft- 
ware, this  table  could  not  be  employed.   The  present  values 
of  the  schedule  table  at  this  installation  show  a  0000  default 
to  the  system  calculated,  minimum  number  of  pages  on  disk  for 
all  users.   This  value  could  be  increased  to  allow  some 
tasks  to  be  allocated  greater  space  on  drum  in  order  that 
fewer  of  their  pages  have  to  be  moved  from  drum  to  disk. 
Nieison  [Ref.lbJ  in  his  simulation  studies  of  time-sharing 
systems,  showed  that  disk  paging  can  be  very  slow  and  can 
reduce  system  performance  substantially,  and  proposes  that  a 
drum  be  used  in  place  of  the  disk.   Since  this  installation 
used  both  drum  and  disk  paging,  an  alternative  solution  could 
be  to  purchase  another  drum  for  paging.   Also,  revision  of 
the  disk  management .algorithm  could  be  made. 

As  mentioned  at  the  outset  of  this  paper,  a  more  flexible 
approach  to  evaluating  the  effects  of  changing  different 
schedule  table  parameters  on  the  performance  of  the  system 
would  have  been  the  simulation  approach  rather  than  an  analy- 
tical approach.   However,  such  a  simulation  model  would  have 
to  be  limited  in  terms  of  expensiveness  of  design  and  running 
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time.   Also,  there  is  always  the  very  difficult  problem  of 
validating  the  simulation  model. 

From  the  tests  conducted  in  this  paper,  attempting  to 
optimally  tune  the  scheduler  by  trying  various  schedule  tables 
in  the  proper  type  of  environment  is  not  an  easy  process. 
There  were  many  factors  which  limited  more  speedy  progress  in 
tuning  the  scheduler  to  the  job  load  of  this  school's  environ- 
ment.  The  benchmark  that  was  implemented  for  the  tesbs  may 
not  have  accurately  represented  the  user  community,  although 
a  great  effort  in  this  direction  was  made.   Since  loads  are 
constantly  changing,  it  is  important  to  develop  a  methodology 
for  automatically  producing  scripts  that  are  characteristic 
at  this  installation  and  then  to  verify  that  they  are  accurate 

The  use  of  the  TSS/36O  software  measurement  technique, 
SIPE  [Ref.17],  would  have  been  very  valuable  and  helpful  in 
establishing  a  good  benchmark  for  developing,  evaluating,  and 
improving  the  interactive  system.   SIPE  and  Its  data-reduc- 
tion program  could  also  have  been  very  helpful  in  evaluating 
changes  to  the  schedule  tables  and  the  effects  on  system 
performance.   These  measurement  tools  could  also  provide 
valuable  statistics  about  each  task  as  it  is  being  processed 
by  the  system.   Software  counters,  as  Doherty  used,  could 
also  provide  information  about  each  task  as  it  migrates 
through  various  levels  of  the  schedule  table  to  more  accu- 
rately verify  the  principles  of  working  set  size  and  balanced 
core  time.   De  Meis  and  Weizer  [Ref.l8]  established  by  exper- 
imental means  in  developing  RCA '  s  Time  Sharing  Operat.'. 
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System  (TSOS)  that  by  using  certain  measurement  devices,  the 
working  set  size  and  balanced-core  time  of  programs  can  be 
monitored  and  verified. 

Although  SIPE  produces  some  degradation  to  the  system, 
this  is  not  considered  serious.   The  only  way  to  monitor  a 
system  without  altering  its  operation  is  by  external  hard- 
ware monitors.   Schulman  [Ref.19],  for  example,  discusses  a 
hardware  monitor  (SPAR)  that  also  is  used  to  measure  TSS/36O 
and  that  does  not  degrade  that  system.   Another  tool  that 
has  been  extremely  useful  in  TSS/36O  evaluation  and  improve- 
ment of  performance  is  the  instruction-time  trace  monitor 
(ITM)  [Ref.20]  which  is  a  combination  of  software  and  hard- 
ware.  With  the  aid  of  these  additional  measuring  devices, 
it  is  believed  that  many  more  improvements  could  bo  made  to 
the  performance  of  TSS/36O  by  further  adjustment  of  the 
entries  in  the  schedule  table. 
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APPENDIX  A 
SCHEDULE  TABLE  PARAMETER  DEFINITIONS 

LEVEL  (STELEVEL),  1  BYTE 

Relative  entry  number  in  schedule  table.   The  level  num- 
ber is  used  to  relative  address  within  the  schedule  table. 

PRIORITY  (STEPRIOR),  1  BYTE 

The  priority  of  a  level  in  conjunction  with  the  Schedule 
Start  Time  (SST)  is  used  to  govern  the  allocation  of  CPU  re- 
sources to  a  task.   Only  those  tasks  brought  into  .the  dis- 
patchable  list  can  increase  in  core  usage.   Zero  is  the 
highest  priority.   When  seeking  to  bring  a  task  into  the  dis- 
patchable  list,  the  highest  priority  task  behind  schedule  is 
chosen.   If  no  tasks  are  behind  schedule,  the  highest  priority 
task  is  chosen. 

QUANTUM  LENGTH  (STETSVAL),  2  BYTES 

The  quantum  length  is  the  number'  of  time  units  (one 
quantum)  a  task  will  be  dispatched  or  the  amount  of  time  to 
be  used  as  a  factor  in  determining  how  long  a  task  will  be 
allowed  to  run  before  time-slice  end.   One  unit  represents 
3-33  milliseconds.   A  quantum  represents  the  maximum  virtual 
memory  time  that  a  task  will  be  dispatched.   The  system  will 
then  make  a  decision  as  to  whether  the  task  may  have  more  CPU 
time  based  on  the  number  of  quanta  used  (see  STEQUANT)  or 
interrupted  by  a  time-slice  end. 
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MAXIMUM  NUMBER  OF  QUANTA  (STEQUANT),  1  BYTE 

This  field  represents  the  maximum  number  of  quanta  (STES- 
VAL)  a  task  may  use  or  receive  when  it  is  in  execution  before 
a  time-slice  must  occur. 

MAXIMUM  PAGES  (STEMAXCR),  1  BYTE 

This  field  represents  the  maximum  number  of  private 
physical  pages  allowed  in  core  before  a  time-slice  end  or  page 
steal  will  occur.   (see  SCAN  THRESHOLD) 

MAXIMUM  DISK  I/O  OR  PAGE  READS  (STEKAXRD),  2  BYTES 

This  field  represents  the  maximum  disk  reads  or  writes, 
or  maximum  number  of  page  relocations  a  task  will  be  allowed 
before  a  time-slice  end  will  occur. 

SCAN  THRESHOLD  (STEST),  1  BYTE 

If  the  steal  request  flag  (STESRP)  is  on,  the  resident 
supervisor  will  release  some  of  a  task's  pages  when  the  page 
count  equals  STEMAXCR  (maximum  core  page  residency  values). 
The  scan  threshold  is  the  percentage  of  STEMAXCR  pages  to  be 
retained.   The  scan  threshold  is  a  percentage  specified  in 
hexidecimal  (i.e.,  80%  =  80  base  10  =  50  base  16).  When  steal- 
ing occurs,  the  task  is  not  time-sliced,  but  stays  in  the 
dispatchable  list.   However,  the  schedule  table  entry  in  the 
TSI  is  changed  to  the  value  specified  In  STENSL  (next  steal 
level) . 
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PULSE  LEVEL  (STEPULSE),  1  BYTE 

This  field  represents  the  schedule  table  level  entry  to 
be  used  if  the  pulse  service  is  requested  by  the  user.  The 
pulse  service  allows  the  user  to  request  a  level  change. 

AWAIT  EXTENSION  (STEAWTEC),  2  BYTES 

This  field  represents  the  maximum  time  that  a  task,  issu- 
ing an  AWAIT  service,  is  allowed  to  remain  in  the  dispatch- 
able  list  while  waiting  for  an  I/O  operation  to  be  completed. 
The  units  are  3-33  milliseconds.   If  the  I/O  operation  has 
not  completed  before  the  time  limit  specified,  the  task  is 
time-sliced. 

DELTA-TO-RUN  TIME  (STEDELTA),  1  BYTE 

Specii'ies  the  real  time  interval  at  wnicn  a  tasK  is  to 
be  given  a  slice  of  CPU  time.   This  field  specifies  a  factor 
which  is  used  to  calculate  a  new  Schedule  Start  Time  (SST) 
for  a  task  as  it  moves  from  one  state  to  another;  i.e.,  as 
the  task  becomes  ready,  in  AWAIT  or  in  TWAIT.   The  value  in 
this  field  is  multiplied  by  852.5  milliseconds  and  may  be 
combined  with  the  master  clock  time  or  the  old  Scheduled 
Start  Time  if  the  old  SST  is  negative  to  determine  the  task's 
new  SST.   If  delta-to-run  equals  zero,  the  SST  is  set  to 
zero  and  the  task  is  automatically  placed  behind  schedule. 
(see  RECOMPUTE  FLAG) 
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(TSE)  TIME-SLICE  END  (STETSEND) ,  1  BYTE 

This  field  represents  the  schedule  table  level  entry  to 
be  used  when  a  time-slice  end  occurs  because  of  the  maximum 
number  of  quanta  (STEQUANT)  or  a  maximum  disk  I/O  (STEMAXRD) 
has  been  reached. 

MAXIMUM  PAGES  TSE  (STEMPRE),  1  BYTE 

This  field  represents  the  schedule  table  level  entry  to 
be  used  when  a  tmme-slice  end  occurs  because  of  the  maximum 
pages  in  core  (STEMAXCR)  has  been  reached. 

TWA  IT  TSE  (STETWAIT),  1  BYTE 

This  field  represents  the  schedule  table  level  to  be 
Used  after  a  time-slice  end  occurs  because  the  TWAIT  service 
has  been  used. 

AWAIT  TSE  (STEAWAIT),  1  BYTE 

This  field  represents  the  schedule  table  level  entry  to 
be  used  after  a  time-slice  end  occurs  because  the  AWAIT  service 
has  been  used. 

RECOMPUTE  FLAG  (STERCMP),  1  BIT 

If  the  recompute  flag  is  on,  the  task's  Scheduled  Start 
Time  is  computed  to  place  the  task  back  on  schedule  as  des- 
cribed above  under  delta-to-run  (STEDELTA) .   If  the  flag 
is  off,  past  performance  (if  behind  schedule)  is  taken  into 
account  by  calculating  SST  as  the  present  time  plus  delta- 
to-run  minus  the  amount  behind  schedule  on  the  previous 
time-slice.   NOTE:  When  a  task  enters  the  eligible  list 
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directly  from  the  dispatchable  list,  the  schedule  start  time 
is  calculated  as  if  the  recompute  flag  is  off. 

PRE-EMPT  FLAG  (STEPRMPT),  1  BIT 

A  task  on  the  dispatchable  list  whose  pre-empt  flag  is  on 
may  be  forced  to  time-slice  end  so  as  to  make  room  for  a  task 
from  the  eligible  list  having  a  higher  priority. 

STEAL  REQUEST  FLAG  (STESRI),  1  BIT 

A  task  on  the  dispatchable  list  whose  steal  request  flag 
is  on  will  have  pages  released  (stolen)  when  its  private 
pages  in  core  reach  the  STEMAXCR  limit.   If  pages  -are  brought 
in  faster  than  they  can  be  released  so  that  the  STEMAXCR 
limit  is  exceeded,  the  task  will  be  time-sliced. 

MAXIMUM  PAGE  RELOCATIONS  PER  QUANTUM  (STEMRQ),  1  BYTE 

Specifies  the  maximum  number  of  page  relocation  inter- 
ruptions allowed  per  quanta  before  the  task  is  declared  pag- 
ing bound;  i.e.,  a  task  is  considered  to  be  execute  bound  if 
its  number  of  page  relocations  per  quantum  is  less  than  or 
equal  to  STEMRQ.   Execute  bound  tasks  are  placed  at  the  end 
of  the  dispatchable  list  to  allow  non  execute  bound  tasks  to 
overlap  their  paging  I/O  with  execute  bound  tasks. 

HOLDING  INTERLOCK  CHANGE  LEVEL  (STEHLCK),  1  BYTE 

This  field  represents  the  schedule  table  level  entry  to 
be  used  when  a  time-slice  end  occurs  (except  for  AWAIT  or 
TWAIT)  and  the  task  is  holding  a  Virtual  Access  Method  (VAM) 
interlock . 
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LOW-CORE  HOLDING  INTERLOCK  (STELCHL),  1  BYTE 

This  field  represents  the  schedule  table  level  entry  to 
be  used  when  a  time-slice  end  occurs  because  of  low-core  and 
the  task  is  holding  a  Virtual  Access  Method  (VAM)  interlock. 

WAITING  ON  INTERLOCK  CHANGE  LEVEL  (3TEWLCK),  1  BYTE 

This  field  represents  the  schedule  table  level  entry  to 

be  used  when  a  time-slice  end  occurs  and  the  task  is  waiting 
on  an  interlock. 

CONVERSATIONAL  WRITE  ONLY  (STECWO),  1  BYTE 

This  field  represents  the  schedule  table  entry  to  be  used 
when  a  write  without  response  message  is  sent  to  the  terminal 
The  level  change  occurs  without  a  time-slice  end. 

LOW  CORE  FORCED  TIME-SLICE  END  (STELCF) ,  1  BYTE 

This  field  represents  the  schedule  table  entry  to  be 
used  when  a  task  is  forced  to  time-slice  end  for  low-core 
and  it  is  not  holding  an  interlock. 

PREJUDICE  CATEGORY  3  (STEPRJ3),  1  BYTE 

This  field  is  not  used  in  the  system. 

NEXT  STEAL  LEVEL  (STENSL) ,  1  BYTE 

This  field  represents  the  schedule  table  entry  to  be  used 
when  stealing  occurs.   The  task  is  not  time-sliced. 

DRUM  SHARE  (STEDSH),  2  BYTES 

This  is  the  number  of  drum  pages  reserved  for  a  task. 
There  are  about  500  pages  available  after  startup  on  a  one 
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drum  system  and  1^00  pages  on  a  two  drum  system.   In  general, 
the  number  of  a  task's  private  pages  on  drum  is  a  function  of 
the  number  of  tasks  logged  on,  the  number  of  drums,  and  the 
time  since  the  last  time-slice.   If  the  number  of  unassigned 
drum  pages  falls  below  a  pre-determined  limit,  some  pages 
are  moved  from  drum  to  disk.   Each  task  receives  a  system 
calculated  minimum  drum  space.   The  drum  share  field  allows 
some  tasks  to  keep  a  large  drum  share.   A  value  of  zero 
defaults  to  the  system  calculated  minimum. 
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APPENDIX  B 


SCHEDULE  TABLES  USED  FOR  DIFFERENT  TESTS 
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